Do GMM Phoneme Classifiers Perceive Synthetic Sibilants as Humans Do?
نویسندگان
چکیده
This study presents a psycholinguistically motivated evaluation method for phoneme classifiers by using non-categorical perceptual data elicited in a Japanese sibilant matching 2AFC task. Probability values of a perceptual [s]-[S] boundary, obtained from 42 speakers over a 7-step synthetic [s]-[S] continuum, were compared to probability estimates of Gaussian mixture models (GMMs) of Japanese [s] and [S]. The GMMs, trained on the Corpus of Spontaneous Japanese, differed in feature vectors (MFCC, PLP, acoustic features), covariance matrix types (full, tied, diagonal, spherical), and numbers of mixtures (1-20). Using ten-fold cross validation, it was found that GMMs trained on MFCC features had the best sibilant classification accuracies (87.4-90.4%), but their correlations with human perceptual data were non-conclusive (0.35-0.98). Acoustic feature-based GMMs with tied covariance matrices had near human-like synthetic stimuli perception (0.957-0.996), but their classification performance was poor (71.3-80.4%). Models trained on perceptual linear prediction (PLP) features were on par with the acoustic feature-based models in terms correlation to the perceptual experiment (0.884-0.995), while losing slightly on classification performance (86.1-88.9%) compared to MFCC models. Across the board correlation tests and mixture-effect models confirmed that GMMs with better sibilant classifying performance produced more human-like probability estimations on the synthetic sibilant continuum.
منابع مشابه
Phoneme-Discriminative Features for Dysarthric Speech Conversion
We present in this paper a Voice Conversion (VC) method for a person with dysarthria resulting from athetoid cerebral palsy. VC is being widely researched in the field of speech processing because of increased interest in using such processing in applications such as personalized Text-To-Speech systems. A Gaussian Mixture Model (GMM)-based VC method has been widely researched and Partial Least ...
متن کاملOn the perception of "segmental intonation": F0 context effects on sibilant identification in German
In normal modally voiced utterances, voiceless fricatives like [s], [ʃ], [f], and [x] vary such that their aperiodic pitch impressions mirror the pitch level of the adjacent F0 contour. For instance, if the F0 contour creates a high or low pitch context, then the aperiodic pitch impression of the fricative in this context will also be high or low. This contextmatching effect has been termed “se...
متن کاملSpeaker recognition using phoneme-specific GMMs
This paper compares three approaches to building phoneme-specific Gaussian mixture model (GMM) speaker recognition systems on the NIST 2003 Extended Data Evaluation to a baseline GMM system covering all of the phonemes. The individual performance of any given phoneme-specific GMM system falls below the performance of the baseline GMM, but fusing the top 40 performing scores of the individual ph...
متن کاملImproving the discrimination between native accents when recorded over different channels
Acoustic differences between native accents may prove to be too subtle for straightforward brute force techniques such as blindly clustered Gaussian mixture model (GMM) classifiers to yield satisfactory discrimination performance while these methods work well for classifying more pronounced differences such as language, gender or channel. In this paper it is shown that small channel differences...
متن کاملThe Sequential GMM: A Gaussian Mixture Model Based Speaker Verification System that Captures Sequential Information
1 Introduction This report presents a novel speaker verification system that generates a new feature set that captures long duration speaker identifying characteristics while taking advantage of the well-established and well-studied Gaussian Mixture Model system (GMM). Much of the innovation in the system is contained in the intelligent exploitation of traditional cepstral features such that te...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016